Defer bwrap mount point cleanup until concurrent sandboxes finish#184
Merged
dylan-conway merged 9 commits intomainfrom Mar 31, 2026
Merged
Defer bwrap mount point cleanup until concurrent sandboxes finish#184dylan-conway merged 9 commits intomainfrom
dylan-conway merged 9 commits intomainfrom
Conversation
apply-seccomp now creates a nested user+PID+mount namespace before applying the seccomp filter. The user command runs as PID 2 under a non-dumpable PID 1 reaper, with /proc remounted so only the inner process tree is visible. This prevents the sandboxed command from ptracing or patching the unfiltered bwrap init, bash wrapper, or socat helpers via /proc/N/mem, regardless of the host's kernel.yama.ptrace_scope setting. Namespace setup failure aborts rather than silently degrading. The BPF filter now also blocks io_uring_setup/enter/register. IORING_OP_SOCKET (Linux 5.19+) creates sockets without going through socket(), and seccomp cannot inspect SQEs in the shared ring, so denying ring creation entirely is the only safe option. The filter generator now accepts an optional target-arch argument so a single builder can emit both x64 and arm64 filters. Prebuilt binaries and filters are regenerated for both architectures.
Regenerated BPF filters from the merged seccomp-unix-block.c source, which now combines the 32-bit masked socket() comparison from main with the io_uring blocking from this branch.
apply-seccomp needs CAP_SYS_ADMIN to unshare PID+mount namespaces. The original approach obtained it via unshare(CLONE_NEWUSER), but on hosts where an LSM restricts unprivileged user namespaces (Ubuntu 24.04 with AppArmor defaults), the nested userns is created without capabilities and the setgroups write fails. bwrap now passes --cap-add CAP_SYS_ADMIN (scoped to its user namespace) so apply-seccomp can unshare directly. The nested-userns path remains as a fallback for standalone invocation. apply-seccomp clears the ambient capability set after remounting /proc, so the sandboxed command's execve drops to zero capabilities and cannot umount /proc to reveal the outer mount underneath. Two new tests cover CapEff=0 and umount denial.
Setuid bwrap rejects --cap-add from non-root because it would grant real host capabilities. --unshare-user forces user-namespace mode so the capability is scoped to that namespace and the flag is accepted.
Setuid bwrap rejects --cap-add from non-root, so that path is a dead end. Instead, disable kernel.apparmor_restrict_unprivileged_userns in CI so apply-seccomp's nested-userns path works without any bwrap cooperation. This matches what production Ubuntu 24.04 users need to do anyway, now documented in the README.
reap_until was waiting for all children including orphaned background processes reparented to PID 1, which hung the sandbox when the user command backgrounded something long-running and then exited. Return immediately when the worker terminates; PID 1 exiting tears down the namespace and SIGKILLs any stragglers.
When two sandboxed commands run concurrently and one finishes first,
cleanupBwrapMountPoints() was deleting mount point files that the
still-running sandbox still depended on. Deleting the mountpoint's
dentry on the host detaches the bind mount in the child namespace
(the dentry is unhashed, so path lookup no longer finds the mount),
so the deny rule stops applying inside the still-running sandbox.
Add an active-sandbox counter: wrapCommandWithSandboxLinux()
increments it, cleanupBwrapMountPoints() decrements it and defers
file deletion until the counter reaches zero. A {force: true} option
bypasses the counter for process-exit and reset().
Also bumps version to 0.0.45.
…ummy-file-race # Conflicts: # package-lock.json # package.json
dylan-conway
approved these changes
Mar 31, 2026
dylan-conway
added a commit
that referenced
this pull request
Mar 31, 2026
Merge origin/main (PRs #183, #184, #192). One conflict in mandatory-deny-paths.test.ts — our describe.if(isSupportedPlatform) had prettier reindent the entire describe body, so git saw every line as changed. Resolved by taking main's version and re-applying the skipIf migration. Also migrated the new pid-namespace-isolation.test.ts from #183 to describe.if(isLinux) with hard assertions in beforeAll for the vendor binaries (if apply-seccomp or the BPF filter is missing on Linux CI, that's a real failure, not a skip). --- Capture socat stderr and exit code in bridge init failures. The arm64 CI flake was 'Failed to create bridge sockets after 5 attempts' with no further information — stdio: 'ignore' sent socat's stderr to /dev/null, and the .killed check only detected if WE killed socat, not if it died on its own. Changed: - stdio: ['ignore', 'ignore', 'pipe'] — buffer stderr - check .exitCode !== null instead of .killed to detect crashes - error message now includes each bridge's exit code, whether its socket was bound, and any stderr it produced Next time the flake hits, the CI log will say whether socat crashed (and why) or whether it was genuinely still starting when the 600ms poll budget ran out.
sysid
added a commit
to sysid/sandbox-runtime-improved
that referenced
this pull request
Mar 31, 2026
Rebased sysid branch onto upstream/main (v0.0.45), absorbing: - fix: allow filesystem root traversal when denyRead includes '/' (anthropic-experimental#190) - feat: isolate seccomp workload in nested PID ns and block io_uring (anthropic-experimental#183) - feat: defer bwrap mount point cleanup until concurrent sandboxes finish (anthropic-experimental#184) - chore: bump upstream version to 0.0.45 (anthropic-experimental#192)
sysid
added a commit
to sysid/sandbox-runtime-improved
that referenced
this pull request
Apr 3, 2026
Rebased sysid branch onto upstream/main (v0.0.45), absorbing: - fix: allow filesystem root traversal when denyRead includes '/' (anthropic-experimental#190) - feat: isolate seccomp workload in nested PID ns and block io_uring (anthropic-experimental#183) - feat: defer bwrap mount point cleanup until concurrent sandboxes finish (anthropic-experimental#184) - chore: bump upstream version to 0.0.45 (anthropic-experimental#192)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
When two sandboxed commands run concurrently on Linux and one finishes first,
cleanupBwrapMountPoints()was deleting mount point files that the still-running sandbox still depended on. Deleting a mountpoint's dentry on the host detaches the bind mount inside the child namespace (the dentry is unhashed, so path lookup no longer finds the mount), which means the deny rule stops applying inside the still-running sandbox for the remainder of its execution.This PR adds an active-sandbox counter:
wrapCommandWithSandboxLinux()increments it for every bwrap-using invocation (after the no-restrictions early return), and decrements in its catch block if wrapping failscleanupBwrapMountPoints()decrements it and defers file deletion until the counter reaches zero{ force: true }option bypasses the counter for the process-exit handler andreset()The public API (
wrapWithSandboxreturnsPromise<string>,cleanupAfterCommand()takes no args) is unchanged — callers that already pair wrap/cleanup 1:1 need no changes.Also bumps the package version to 0.0.45.
Note
This PR is stacked on top of #183. The diff against
mainwill include those commits until #183 merges; the actual delta for this PR is the last commit on the branch.Test plan
test/sandbox/mandatory-deny-paths.test.tsthat exercise concurrent sandbox lifecycles:defers mount point cleanup while another sandbox is still running— long-running sandbox A, short sandbox B finishes and triggers cleanup, A's deny rule still holdsdefers cleanup when two sandboxes share the same non-existent deny path— both wraps see the path as non-existentdeferred cleanup runs once all concurrent sandboxes finish— verifies ghost files are eventually removedmandatory-deny-paths.test.tsstill pass (addedafterEachwith{force: true}for test isolation)symlink-write-path,allow-read,symlink-boundarynpx tsc --noEmitcleannpx eslintclean